Attenuated but immunostimulatory Mycobacterium tuberculosis variant bovis strain Ravenel shows variation in T cell epitopes

Tuberculosis, caused by Mycobacterium tuberculosis complex (MTBC) organisms, affects a range of humans and animals globally. Mycobacterial pathogenesis involves manipulation of the host immune system, partially through antigen presentation. Epitope sequences across the MTBC are evolutionarily hyperconserved, suggesting their recognition is advantageous for the bacterium. Mycobacterium tuberculosis var. bovis (MBO) strain Ravenel is an isolate known to provoke a robust immune response in cattle, but typically fails to produce lesions and persist. Unlike attenuated MBO BCG strains that lack the critical RD1 genomic region, Ravenel is classic-type MBO structurally, suggesting genetic variation is responsible for defective pathogenesis. This work explores variation in epitope sequences in MBO Ravenel by whole genome sequencing, and contrasts such variation against a fully virulent clinical isolate, MBO strain 10-7428. Validated MTBC epitopes (n = 4818) from the Immune Epitope Database were compared to their sequences in MBO Ravenel and MBO 10-7428. Ravenel yielded 3 modified T cell epitopes, in genes rpfB, argC, and rpoA. These modifications were predicted to have little effect on protein stability. In contrast, T cells epitopes in 10-7428 were all WT. Considering T cell epitope hyperconservation across MTBC variants, these altered MBO Ravenel epitopes support their potential contribution to overall strain attenuation. The affected genes may provide clues on basic pathogenesis, and if so, be feasible targets for reverse vaccinology.

Over 100 years since vaccination by M. tuberculosis variant bovis (MBO) strain BCG began, tuberculosis remains the deadliest single infectious agent in the world for humans. Our understanding of Mycobacterium tuberculosis complex (MTBC) pathogenesis remains lacking, both in humans and in the myriad non-human hosts MTBC variants attack. An underlying mechanism behind this is the subversion and misdirection of the host immune response, achieved both directly by targeted alteration of host kinase signaling cascades, and indirectly by intentional presentation of conserved antigens that drive specific immune feedback [1][2][3][4] . In a background of mycobacterial antigenic hyperconservation, this research asked what level of epitope variation can be detected in each an attenuated and a virulent MBO strain, and if variation may signal loss of virulence through dysregulation of host immune control.
At a simplistic level, a host's immune response to infection-including recognition of pathogen markers, response by innate and adaptive immunity, and ultimate pathogen clearance-depends on the receipt of signals differentiating self and foreign molecules. Antigens here generically refer to such markers that a host can react to in mounting a pathogen-specific immune response. Under the classic model, after initial infection, pathogen antigens recognized by the immune system are under selective pressure to change, and the host in return faces selective pressure to maintain recognition of changing targets in a process long referred to as an evolutionary arms race [5][6][7] . Over the course of the infection, immunity develops, targeting of specific antigens arises, the infection is suppressed, and re-infection by the same pathogen later hindered. On the other hand, immune responses to mycobacterial pathogens like M. tuberculosis are well-known to be more nuanced and skewed towards a cellmediated immune response 4,8,9 . Early research suggested a limited role of B cells or antibodies in protection, and although these subsets do have beneficial roles 3,10,11 , it is known that a successful immune response against MTB

Results
For Ravenel, 4130 raw epitope hits were recorded against the IEDB dataset (Table 1: Raw Hits). Of these, 2488 were a perfect match to a characterized epitope, which are considered uninformative as this work seeks epitope variation (Table 1: Matches). In Ravenel, filtering the remaining subset to epitopes showing a single amino acid mismatch vs. a characterized epitope, 100% coverage of the Ravenel hit vs. the IEDB query, and with no gaps vs. the IEDB query returned 303 matches (Table 1: Degenerate Mismatches). In most cases in the epitope dataset, sequences are M. tuberculosis var. tuberculosis-derived, and additionally, one epitope can have multiple known variants in the database-thus, degenerate mismatches-and so each mismatch was then interrogated for a perfect match against other variant epitope sequences, as well as against the AF2122/97 epitope hits to reduce the chance variants were simply fixed in the MBO background. Of the 303 Ravenel single amino acid hits against the epitope dataset, 188 mismatches were found to have a 100% match against a different variant epitope in the dataset and were therefore excluded, leaving 115 epitopes in Ravenel that differed by 1 amino acid from the IEDB sequences. Comparing these 115 epitopes to the output for the epitope workflow for AF2122/97 yielded 100% matches for 110, leaving just 5 epitopes in Ravenel that were mismatches from known sequences, and different from the same epitope sequence in the MBO reference strain AF2122/97 (Table 1: Unique Mismatches). Expanding to 2 amino acid mismatches yielded 293 degenerate epitopes, and 0 unique epitopes. Finally, 3 mismatches returned 289 degenerate epitopes, and filtered to 1 unique epitope. This process was repeated for 10-7428 and BCG-1. It is important to note that for reference AF2122/97, the process does not include subtracting epitopes that may be the wild-type in MBO, yielding an artificially high count of 121 1AA changes, 28 2AA changes, and 13 3AA changes. Table 1 reports the initial results for each of the 4 strains.
Most mismatches in Ravenel, 10-7428, and BCG-1 mapped perfectly to an existing variation in AF2122/97, leaving only a handful of changes not observed elsewhere (Table 1: Unique Mismatches). In Ravenel, five potentially impactful single amino acid mismatches were initially recorded. Epitope ID 229352, a 15aa epitope and one of only two known antigenic regions in RpfB, showed Glu263Gly. Epitope 595988 in ArgC presented Tyr20His. Epitope 597585 in RpoA showed Glu75Asp. These substitutions are provided as Supplemental File 1: Ravenel Changes. The remaining two mismatches (in epitopes 163642 and 163423) both affected the same gene, esxJ, but Figure 1. Schematic of epitope extraction workflow for MBO strain Ravenel. Epitope Input: All classified epitope sequences from any Mycobacterium tuberculosis complex members are downloaded and filtered to include linear peptide sequences. Genome Input: The MBO reference genome AF2122/97, as well as reference BCG-1 (Russia) genomes were downloaded in .fna format from NCBI RefSeq. Strains Ravenel and 10-7428 were downloaded as draft genomes. Genomes were processed by BLAST + command makeblastdb to generate searchable BLAST databases. tblastn: Blast searches were performed with default parameters, searching 4818 epitopes against each genome database to match IEDB epitopes to their counterparts encoded in the genome. Output and Filtering: Blast hits were filtered as described in "Materials and Methods" to identify epitopes with amino acid substitutions vs. the IEDB epitope and that are not seen in reference AF2122/97. www.nature.com/scientificreports/ this region showed an unusual pattern in the Ravenel assembly with the gene broken into 3 ORFs each containing partial starts and stops, with an identical broken pattern seen in the 10-7428 assembly suggesting this may be a systemic assembly error, and so changes in this gene were marked as unreliable. The three amino acid mismatch in Ravenel (epitope 100593) was also unreliable, mapping to an unknown PE/PPE family gene on an unplaced contig. These excluded sequences are listed in Supplemental File 1: Ravenel Unreliable Changes. In contrast, virulent strain 10-7428 showed a single 1 amino acid substitution, PPE42 Asn232Asp in B cell epitope 10022 (Supplemental File 1: 10-7428 Changes). Strain 10-7428 showed a 3 amino acid mismatch beyond this, again in partial esxJ hits could be an assembly error (Supplemental File 1: 10-7428 Unreliable Changes) which led to subsequent exclusion of this epitope. As expected, BCG-1 Russia showed the most variation in this analysis, a finding that likely arises both from its historical age 21,22 relative to the others in the dataset as well as its impaired functionality. Antigen 85B showed a substitution Phe140Leu that impacted four overlapping epitopes; PPE genes featured multiple variants, including three epitopes inconclusively from the highly homologous PPE18/PPE19 cluster of genes known to show sequence variation compared to their characterized T-cell epitopes 7 and one in PPE25; the virulence-associated serine protease MarP showed one; and one change was observed in an Mce family protein (Supplemental File 1: BCG-1 Changes).
The values for AF2122/97 are relative to the data from IEDB with is predominated by MTB epitopes, and so its unique mismatches values appear much greater than the other strains. A more objective comparison would be analyzing degenerate mismatches (Table 1: Degenerate Mismatches), where strains appear similar.
For observed changes in Ravenel and 10-7428, predictions of effects on protein stability by ΔΔG values were performed and recorded for observed single mutations ( Table 2) [25][26][27][28][29] . Full length protein sequences for RpfB, ArgC, and RpoA in Ravenel, and PPE42 in 10-7428 were searched by BLASTP against the AF2122/97 sequences for potential compensatory mutations, but only the originally detected single amino acid substitution was seen across the entire protein for each of these. For RpoA, given its presence in the large multi-subunit RNA polymerase complex, additional searches were performed for Ravenel's RpoB and RpoC sequences, but these were unchanged relative to the AF2122/97 reference.

Discussion
It is known that, with the exception of some members of the PE/PPE gene family, T cell epitopes are hyperconserved across the MTBC 6,7,15 . This is thought to reflect an evolutionary strategy of intentional host immune manipulation by presentation of T cell epitopes to drive specific immune responses that the pathogen can leverage to its advantage 6,13,15,17 . In this work, epitopes were studied to assess whether variation in T cell epitopes might be associated with attenuation by a loss of host immune modulatory potential. An initial investigation was performed using recently sequenced MBO Ravenel, an attenuated strain that provokes a robust immune response in cattle but does not cause lesions or persistent disease. In comparison, fully virulent cattle isolate MBO strain 10-7428 was also analyzed, along with MBO BCG-1 and MBO AF2122/97. Overall, the number of changes observed across the three strains compared to AF2122/97 was small, but was expected based on existing knowledge about T cell epitope conservation 6,15 . Analysis shows 3 changes to T cell epitopes across Ravenel, impacting dormancy, arginine biosynthesis, and the alpha chain of the DNA-dependent RNA polymerase. In contrast, no T cell epitope changes were seen in 10-7428, which bore only a single change in a B cell epitope of PPE42, an immunogenic gene already known to exhibit antigenic variation 7,30 . These results are compatible with the hypothesis that T cell variation could impair pathogenesis, though further investigation and confirmatory testing of these specific epitopes is required.
RpfB, or resuscitation promoting factor B, is one of five rpf genes known to be crucial in the transition from mycobacterial dormancy back to growth and infection dissemination. It is known that deletion of any one rpf gene still allows normal in vitro or in vivo growth but significantly impaired reactivation in a murine model of MTB infection, and furthermore that deleterious effects in infection and persistence are dramatic in a double-knockout background 31 . MBO Ravenel is capable of cattle infection, but after provoking a strong immune response it fails to produce lesions in most experimentally infected animals, unlike virulent strains 20,32 . The change (Glu263Gly) observed in epitope 229352 was earlier reported as a SNP 20 . The large, polar, charge-bearing glutamic acid to a small, flexible, non-polar glycine is a major alteration, and across homologues, position 263 is almost always either a glutamic acid, or an aspartic acid (the COG3583 consensus sequence residue). RpfB in Ravenel is unique in NCBI's Identical Protein Groups database, and TBLASTN searches of the NCBI NR database and WGS database filtered by Mycobacterium tuberculosis complex (taxid: 77643) showed no hits for this substitution outside Ravenel. Per a partial crystal structure by Ruggiero et al., residue 263 (Fig. 2, green highlight) is in a linking region connecting two different three-strand stretches of beta-sheets, and the enhanced flexibility caused by replacement of this large, charged residue with a glycine could destabilize this region 33 . However, in silico ΔΔG predictions yielded inconsistent results ranging from weakly stabilizing to weakly destabilizing ( Table 2: RpfB) and what effects this may have on the larger protein structure and function are uncertain. Since the catalytic residue and binding pocket is more than 30 amino acids upstream, it is unlikely that catalysis is directly compromised by this change, but the influence on tertiary structure and function is an open question that needs to be further evaluated. Regardless, this substitution is considerable at an epitope level, so even if protein functionality is maintained, altered immune recognition of the RpfB protein may still affect virulence. Epitope 229352 is T cell epitope shown previously to elicit a CD8 + dominant response and release of IFN-γ and TNFα 34 . As such, RpfB remains an interesting candidate for modulation of virulence in MBO Ravenel and beyond.
ArgC, or N-acetyl-gamma-glutamyl-phosphate reductase, is an oxidoreductase found in the L-arginine biosynthesis pathway 35 and well-conserved across Actinobacteria. Transposon mutagenesis in MTB has identified all Arg members as essential in vitro 36 , and argB and argF knockouts are efficiently sterilized from murine www.nature.com/scientificreports/ infection models in both C57BL/6 and immunocompromised SCID mice, with ΔargB infections of the latter group resulting in 100% survival to 300 days and complete clearance even at 10 8 CFU/mL doses 37 . Less research has been performed with ArgC by comparison, but a modest IFN-γ response against the specific epitope 595988 has been demonstrated by ELISA in Bos taurus 38    www.nature.com/scientificreports/ 30). Like for RpfB, the impact of this change on a short, linear T cell epitope seems more likely significant than a tyrosine to histidine substitution for overall protein function. While this change is rare, its presence in a small number of human and bovine clinical isolates makes it unlikely to have a major impact on pathogenesis alone. It may instead be a contributor in overall attenuation through cumulative changes in T cell recognition. RpoA, or the DNA-directed RNA polymerase subunit alpha, is a core subunit of the RNA polymerase (RNAP) complex conserved across bacteria. It is the target of the antibiotic rifampicin, and while this critical enzyme is known to be intensely conserved, in MTB in particular, changes in RpoA-RpoC are known to be associated with rifampicin resistance, especially with changes in RpoB 39,40 . It has also been found in Mycobacterium as well as Salmonella that compensatory mutations across the genes in the RNAP complex are necessary to offset the fitness deficits of resistance-conferring mutation in these essential genes 39,41,42 . In MBO Ravenel, a change was observed in epitope 597585, leading to RpoA Glu75Asp (Fig. 4). No other substitutions are seen in RpoA, RpoB, or RpoC. Curiously, RpoA Glu75Asp is seen in all the same strains that carry ArgC Tyr20His change, as well as MBO strains M1009 and M1010, both clinical isolates from slaughter of two cattle (Bos taurus) in Paraguay. Like epitope 595988, RpoA's T cell epitope 597585 has been validated in a bovine interferon gamma release assay 38 . The substitution is not one believed associated with drug resistance, and while the physiochemical differences between aspartic and glutamic acids are minimal, this change may influence immune recognition, or even still incur minor fitness costs given the fundamental role RpoA plays. Regardless, it seems unlikely this epitope would be a sole major driving factor of observed attenuation in Ravenel and might instead be a contributor among a constellation of changes.
A 4th single amino acid mismatch in Ravenel, in epitope 163423, was found to be in a fragmented, assembled sequence of EsxJ. This epitope mismatch was discarded as unreliable. Subsequently, the last 1aa mismatch (epitope 163642) was found to involve this same gene and was also discarded. Finally, the 3aa mismatch in Ravenel was determined to fall in a partial PPE family protein (WP_152345480.1) that aligns ambiguously with multiple possible PE/PPE genes. Due to the unreliable nature of these genes in sequencing and assembly, this match was disregarded.
Only one unique single epitope change was observed in the virulent strain 10-7428, a modification of PPE42 (Mb2640 in MBO). This protein is known to be highly immunogenic and stimulates a strong humoral response in human patients 43 and is one of the selected antigens in the ID93 subunit vaccine 44,45 . It is suggested that variation in some PE/PPE genes, unlike in most other antigens, may actually be beneficial to MTB in avoiding a productive, Th1-dominant host immune response 44 . In strain 10-7428, epitope 10022, one of three recognized in PPE42, presented with Asn232Asp. This modification is seen in only one other sequenced isolate. Interestingly, 10022 www.nature.com/scientificreports/ is a validated B cell epitope unlike the exclusive T cell epitopes changes seen in Ravenel. In contrast with other PE/PPE family genes analyzed in this work, PPE42 is a relatively distinct gene and sequencing, and placement of this substitution is clean in the Ravenel assembly. However, as no in vitro crystal structure has been obtained for this protein and PE/PPE genes are often poorly understood regardless, any interpretation of this finding must remain limited absent further investigation in vitro. In Ravenel, an additional epitope change was reported that also arose from an atypical assembly pattern around esxJ, and this epitope was discarded as an artifact.
The Antigen 85B complex has been known for decades to be a dominant secretion product and immune stimulator 46,47 . Antigen 85B within the MTBC is strongly conserved, with more variation across NTM but noted antibody cross-reactivity between all variants 48 . In BCG-1 Russia, a single substitution was observed, Phe140Leu, that resulted in a change that appears fixed across all BCG strains, with very few non-BCG examples from lab or clinical strains. This single amino acid change modifies four characterized epitopes (149353, 196212, 16926, and 18898) that all overlap this position.
MarP is a serine protease involved in maintenance of intracellular pH during phagocytosis through the processing of peptidoglycan through its interactions with RipA 49 . Loss of MarP or its catalytic residues leave the bacterium severely impaired for pathogenesis in vitro and in a murine model of infection 50,51 . In BCG-1 Russia, epitope 600022 of MarP carried Asn165Thr change, a mutation in the flexible linker region between transmembrane anchor and protease domain. Like the Ag85B mutation, this substitution is also fixed across BCG strains and appears again only in the same small number of lab and clinical isolates, raising the possibility these isolates may be evolutionarily similar to BCG, or cases of BCG-osis.
Mce (mammalian cell entry) family proteins are a class of protein involved with entry into host cells and interference with host cell signaling pathways, particularly through the ERK1 and ERK2 MAPK pathway for inducing cytokine production 52 . T cell epitope 20707 presented with Thr46Pro. At a molecular level, the sequence change is another significant structural alteration, though whether this affects the function of this protein or its immune recognition in BCG-1 are beyond the scope of this work.
PE/PPE family changes were numerous in BCG-1, but as before, they are difficult to rely on and are reported only with the caveat that their true sequence and placement in the assembly are unclear.
Finally, in silico predictions for ΔΔG were variable and conflicting (Table 2), a known problem in the field 53 . The use of these tools may also be complicated by the specific portions of molecules being studied-surface residue substitutions have been shown previously to be more poorly predicted than changes to residues buried within the protein 54 , and epitopes herein were exposed residues. Despite this, a message remains in these contradictory findings: no consensus changes of a magnitude > 0.5 kcal/mol in either a stabilizing or destabilizing direction was measured for any affected protein in Ravenel or 10-7428, and it is therefore less likely that any observed substitutions are sufficient to modify an affected protein enough to ablate functionality. This investigation is premised on the idea that changes to epitope recognition can lead to dysregulation of host manipulation by Mycobacterium species. Indeed, if mutant proteins retain their molecular functionality but still alter virulence, modified epitope recognition is a means of describing this contradiction.
Using epitope data gleaned largely from M. tuberculosis var. tuberculosis, and after sequencing a uniquely attenuated M. tuberculosis var. bovis genome in strain Ravenel, the haystack of 4130 epitope alignments yielded 3 needles of epitopes with non-synonymous changes in characterized epitopes. Two of these epitopes have been validated in bovine IFN-γ release assays, and one in a murine model showing IFN-γ and TNF-α release. As recent work has demonstrated, MBO Ravenel provokes potent cell-mediated immune responses in the bovine host, but fails to produce granulomatous lesions in nearly all cases, and the process of an infection leading to granuloma formation is known to be tightly associated with the precise presentation of specific T cell epitopes at specific times 13,20 . In contrast to these, an isolate from a dairy cattle outbreak-MBO 10-7428-was found to contain no T cell epitope variation, and instead the only change fell in a B cell epitope in a gene where variation is associated with increased virulence. These results warrant the further investigations into how these changes in T cell epitopes may be an indicator of attenuation in MBO Ravenel and other strains, an investigation of their role in pathogenesis, and whether any of these epitopes may prove useful as subunits for vaccine development, as has already been proposed for RpfB 34 . An essential next step towards validating the biological implications of this work would be in the testing of wild-type vs. altered epitopes in a bovine T cell assay utilizing samples from previously infected cattle. Detectable differences, positively or negatively, in T cell activation between these epitopes would support a role in attenuation and merit closer scrutiny into the immunological effects. The analytic process described here is simple, and with limited development could be readily incorporated into standard comparative genomics analysis for MTBC organisms, increasing the amount of information researchers can extract from each experiment.

Materials and methods
Immune epitope database (IEDB) data accession and filtering. The US National Institutes of Health and the Department of Health and Human Services jointly support and operate the Immune Epitope and Analysis Database (IEDB), a comprehensive database of over 1.5 million experimentally validated epitopes from a range of species and diseases 55 . The IEDB (accessed September 2022) was used to collect all epitopes validated in the Mycobacterium tuberculosis complex (ID:77643, n = 4894), primarily from Mycobacterium tuberculosis var. tuberculosis (n = 4111) and M. tuberculosis var. bovis (n = 783), including 356 M. tuberculosis var. bovis BCG strains 55 . These sequences include non-peptide targets like LAM which were removed (n = 46), B cell-specific discontinuous peptide sequences (n = 2), as well as peptides that had undergone post-translational modifications (n = 28), both because they are unable to be processed through tblastn and because the presence or absence of these modifications could definitionally not be verified by screening against whole genome sequences.  56,57 . Two of these strains of M. tuberculosis var. bovis were sequenced and analyzed by our lab: attenuated Ravenel (GCF_018305025.1) and virulent 10-7428 (GCF_018305045.1) 19 . From existing databases, the attenuated reference BCG-1 (Russia) (GCF_001483905.1), and virulent reference AF2122/97 (GCF_000195835.3) were selected. The BLAST + package (v2.11.0) 57 , preinstalled on the MSU HPCC, was loaded and the makeblastdb command used with default parameters to generate local databases based on each of the four genomes mentioned. All tblastn searches were with default BLAST parameters, with the filtered list of 4818 epitope sequences searched against each genome database 57 . Some highly repetitive epitopes-particularly those from PE/PPE genes-were caught by filtering, Karlin-Altschul parameters were not calculated by the BLAST algorithm, and they were subsequently excluded from analysis. The tblastn output was produced twice, once in pairwise-alignment format, and once in tabular format. After processing below, epitopes that were unique to our dataset were also TBLASTN searched against the broader NCBI WGS database to determine their presence or absence across all sequenced strains.
Output filtering and processing. Tables were viewed in Excel, sorted by number of mismatches, sequences with gaps were excluded, and the subsets of homology hits with 0, 1, 2, or 3 mismatches separated into their own respective sheets. The XLOOKUP function was used to find whether any mismatched epitopes had a 100% match against any variant epitope in the Match sheet, and epitopes that appeared only to vary from the IEDB listed sequences-that is, epitopes that truly do not perfectly match known epitope sequences in IEDB-were moved into their own sheets. Finally, for strains Ravenel, 10-7428, and BCG-1, the list of epitopes that varied from IEDB sequences was queried against the same list of TBLASTN searches from MBO reference strain AF2122/97 in order to identify and dismiss candidate epitopes that were identical to the MBO reference and may simply represent the MBO genetic background. The pairwise BLAST results list was then analyzed for each remaining mismatch, epitope IDs queried on IEDB for details, and results recorded. Finally, BLAST searches were also performed with the full-length proteins against their H37Rv and AF2122/97 counterparts to explore potential compensatory mutations. A schematic representation of this workflow is presented in Fig. 1.
In silico protein stability prediction software. Unique epitopes in Ravenel, 10-7428, and BCG-1 Russia were queried through UniProt and NCBI, and effects on protein stability examined by three high-performing software tools benchmarked by Pancotti et al. 58 (2022)-DDGun3D, PremPS, and INPS-as well as an additional recently released tool, DynaMut2, using in vitro crystal structures preferentially and DeepMind's Alpha-Fold in silico predictions when in vitro structures were not available [25][26][27][28][29] .

Data availability
Genome data used in this study are available through NCBI BioProject PRJNA713797. GenBank accession numbers are as follows: for raw reads, SRX10318108 for Ravenel and SRX10318109 for 10-7428; for post-PGAP sequences, JAGEUB000000000.1 for Ravenel and JAGEUC000000000.1 for 10-7428. All analyzed data are presented in tables, figures, or supplemental files in this study.